Skip to content

[gpt-oss] use vLLM instead of openai types for streaming#25186

Merged
hmellor merged 6 commits intovllm-project:mainfrom
qandrew:gpt-oss-streaming-3-wrap-types
Sep 30, 2025
Merged

[gpt-oss] use vLLM instead of openai types for streaming#25186
hmellor merged 6 commits intovllm-project:mainfrom
qandrew:gpt-oss-streaming-3-wrap-types

Conversation

@qandrew
Copy link
Copy Markdown
Contributor

@qandrew qandrew commented Sep 18, 2025

Purpose

Test Plan

Test Result

client

curl http://localhost:20001/v1/responses   -H "Content-Type: application/json"   -N   -d '{
    "model": "/data/users/axia/checkpoints/gpt-oss-120b",
    "input": [
        {
            "role": "user",
            "content": "Hello."
        }
    ],
    "temperature": 0.7,
    "max_output_tokens": 256,
    "stream": true, 
    "enable_response_messages": true
}'  

ResponseCompletedEvent output (see that the input/output messages are there)

{
    "response": {
        "id": "resp_aed05365be36434fb820a6eab52bfc39",
        "created_at": 1758328028,
        "incomplete_details": null,
        "instructions": null,
        "metadata": null,
        "model": "/data/users/axia/checkpoints/gpt-oss-120b",
        "object": "response",
        "output": [
            {
                "id": "rs_2ebdf86c76a54ddb85c5a867c31f9b77",
                "summary": [],
                "type": "reasoning",
                "content": [
                    {
                        "text": "The user says \"Hello.\" Probably a greeting. Should respond politely, maybe ask how can help. Keep brief friendly.",
                        "type": "reasoning_text"
                    }
                ],
                "encrypted_content": null,
                "status": null
            },
            {
                "id": "msg_30edf4e235b546078b529ebfaa6d7bee",
                "content": [
                    {
                        "annotations": [],
                        "text": "Hello! How can I assist you today?",
                        "type": "output_text",
                        "logprobs": null
                    }
                ],
                "role": "assistant",
                "status": "completed",
                "type": "message"
            }
        ],
        "input_messages": [
            {
                "author": {
                    "role": "system",
                    "name": null
                },
                "content": [
                    {}
                ],
                "channel": null,
                "recipient": null,
                "content_type": null
            },
            {
                "author": {
                    "role": "user",
                    "name": null
                },
                "content": [
                    {}
                ],
                "channel": null,
                "recipient": null,
                "content_type": null
            }
        ],
        "output_messages": [
            {
                "author": {
                    "role": "assistant",
                    "name": null
                },
                "content": [
                    {}
                ],
                "channel": "analysis",
                "recipient": null,
                "content_type": null
            },
            {
                "author": {
                    "role": "assistant",
                    "name": null
                },
                "content": [
                    {}
                ],
                "channel": "final",
                "recipient": null,
                "content_type": null
            }
        ],
        "parallel_tool_calls": true,
        "temperature": 0.7,
        "tool_choice": "auto",
        "tools": [],
        "top_p": 1.0,
        "background": false,
        "max_output_tokens": 256,
        "max_tool_calls": null,
        "previous_response_id": null,
        "prompt": null,
        "reasoning": null,
        "service_tier": "auto",
        "status": "completed",
        "text": null,
        "top_logprobs": null,
        "truncation": "disabled",
        "usage": {
            "input_tokens": 67,
            "input_tokens_details": {
                "cached_tokens": 64
            },
            "output_tokens": 43,
            "output_tokens_details": {
                "reasoning_tokens": 25,
                "tool_output_tokens": 0
            },
            "total_tokens": 110
        },
        "user": null
    },
    "sequence_number": 45,
    "type": "response.completed"
}

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@qandrew qandrew changed the title [gptoss] use vLLM streaming types [gpt-oss] use vLLM streaming types Sep 18, 2025
@mergify mergify Bot added frontend gpt-oss Related to GPT-OSS models labels Sep 18, 2025
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Sep 19, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @qandrew.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify Bot added the needs-rebase label Sep 19, 2025
Signed-off-by: Andrew Xia <axia@meta.com>
@qandrew qandrew force-pushed the gpt-oss-streaming-3-wrap-types branch from 0c32be0 to 7b82dbf Compare September 20, 2025 00:18
@mergify mergify Bot removed the needs-rebase label Sep 20, 2025
Signed-off-by: Andrew Xia <axia@meta.com>
@qandrew qandrew changed the title [gpt-oss] use vLLM streaming types [gpt-oss] use vLLM instead of openai types for streaming Sep 20, 2025
@qandrew qandrew marked this pull request as ready for review September 21, 2025 04:50
@qandrew
Copy link
Copy Markdown
Contributor Author

qandrew commented Sep 21, 2025

cc @alecsolder , @yeqcharlotte , @chaunceyjiang ready for review. i think the readthedocs CI failure is a flake

@lacora
Copy link
Copy Markdown
Contributor

lacora commented Sep 22, 2025

maybe we need to think about whether allowing input_messages / output_messages for sync responses only or for streaming as well, appending long messages kills the efficiency that streaming provides cc @yeqcharlotte

@qandrew
Copy link
Copy Markdown
Contributor Author

qandrew commented Sep 22, 2025

maybe we need to think about whether allowing input_messages / output_messages for sync responses only or for streaming as well, appending long messages kills the efficiency that streaming provides cc @yeqcharlotte

that's a good point! this PR only modifies events in streaming right now (ResponseCreatedEvent, ResponseInProgressEvent, ResponseCompletedEvent), so none of the deltas / intermediate events will not get additional payload. Or i could remove from ResponseCreatedEvent, ResponseInProgressEvent but maybe it's fine to keep it for now for consistency?

Copy link
Copy Markdown
Collaborator

@yeqcharlotte yeqcharlotte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. could you add a basic skeleton tests that load the type. cc:
@chaunceyjiang @aarnphm PTAL too thanks!

ResponseWebSearchCallCompletedEvent,
ResponseWebSearchCallInProgressEvent,
ResponseWebSearchCallSearchingEvent)
# yapf: enable
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the format change intentional

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, yapf/isort had a conflict, it seems like it happens pretty often...

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we keep the original format?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yeqcharlotte we can't. The introduction of below makes the conflict, i tried to but there will be conflicts. You can see in serving_responses.py below # yapf conflicts with isort for this block happens often

from openai.types.responses import (
    ResponseInProgressEvent as OpenAIResponseInProgressEvent)

Copy link
Copy Markdown
Collaborator

@yeqcharlotte yeqcharlotte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! thanks for the change!

@github-project-automation github-project-automation Bot moved this from To Triage to Ready in gpt-oss Issues & Enhancements Sep 27, 2025
@yeqcharlotte yeqcharlotte enabled auto-merge (squash) September 27, 2025 00:10
@github-actions github-actions Bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 27, 2025
Signed-off-by: Andrew Xia <axia@fb.com>
auto-merge was automatically disabled September 30, 2025 20:51

Head branch was pushed to by a user without write access

@hmellor hmellor enabled auto-merge (squash) September 30, 2025 22:09
@hmellor hmellor merged commit 5db1870 into vllm-project:main Sep 30, 2025
45 checks passed
pdasigi pushed a commit to pdasigi/vllm that referenced this pull request Oct 2, 2025
…t#25186)

Signed-off-by: Andrew Xia <axia@meta.com>
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
yewentao256 pushed a commit that referenced this pull request Oct 3, 2025
Signed-off-by: Andrew Xia <axia@meta.com>
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
tomeras91 pushed a commit to tomeras91/vllm that referenced this pull request Oct 6, 2025
…t#25186)

Signed-off-by: Andrew Xia <axia@meta.com>
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025
…t#25186)

Signed-off-by: Andrew Xia <axia@meta.com>
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025
…t#25186)

Signed-off-by: Andrew Xia <axia@meta.com>
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025
…t#25186)

Signed-off-by: Andrew Xia <axia@meta.com>
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025
…t#25186)

Signed-off-by: Andrew Xia <axia@meta.com>
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

frontend gpt-oss Related to GPT-OSS models ready ONLY add when PR is ready to merge/full CI is needed

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants